A hierarchical viterbi algorithm for Mandarin hybrid speech synthesis system
نویسندگان
چکیده
The hybrid speech synthesis system, which combines the hidden Markov model and unit selection method, has become an additional main stream in state-of-the-art TTS systems. However, traditional Viterbi algorithm is based on global minimization of a cost function and the procedure can end up selecting some poor-quality units with larger local errors, which can hardly be tolerated by the listeners. In Mandarin and many other languages, the naturalness of the region of consecutive voiced speech segments (CVS) is more essential to the overall quality of the synthetic speech. Consequently, in this paper, we proposed to use a hierarchical Viterbi algorithm which involves two rounds of Viterbi search: one is for the sub-paths in the CVS regions; the other is for the utterance path connecting all the sub-paths. In the proposed technique, we defined CVS Region as a region which is formed by two or more voiced phones, and whose observation of pitch has a continuous value. Subjective evaluations suggest that the use of hierarchical Viterbi algorithm in the Mandarin hybrid speech synthesis system outperforms the use of traditional algorithm in both the naturalness and speech quality of synthetic speech.
منابع مشابه
Continuous Mandarin speech recognition using hierarchical recurrent neural networks
An ANN-based continuous Mandarin base-syllable recognition system is proposed. It adopts a hybrid approach to combine an HRNN with a Viterbi search. The HRNN is taken as a frond-end processor and responsible for calculating discrimination scores for all 411 base-syllables. The Vi-terbi search is then followed to nd out the best base-syllable sequence with highest score as the recognized output....
متن کاملAn On-the-Fly Mandarin Singing Voice Synthesis System
An on-the-fly Mandarin singing voice synthesis system, called SINVOIS (singing voice synthesis), is proposed in this paper. The SINVOIS system can receive the continuous speech of the lyrics of a song, and generate the singing voice immediately based on the music score information (embedded in a MIDI file) of the song. Two sub-systems are designed and embedded into the system. One is the synthe...
متن کاملA Unit Selection-based Speech Synthesis Approach for Chinese Mandarin Text-to-Speech
The paper presents a unit selection-based speech synthesis approach for Chinese Mandarin. Unit selection-based approach generates speech by directly connecting pre-recorded speech units. In this approach, a corpus is used as a source unit inventory. A feature vector is defined to describe each unit. To generate speech, the feature vector of the target unit is first calculated. During synthesis,...
متن کاملA Unit Selection-based Speech Synthesis Approach for Mandarin Chinese
The paper presents a unit selection-based speech synthesis approach for mandarin Chinese. Unit selection-based approach generates speech by selecting proper units from a speech corpus and connecting them together. In this approach, a set of features are defined to describe the speech units in the corpus and the expected units in the synthesized utterance. Based on the features, cost function is...
متن کاملAn MRNN-based method for continuous Mandarin speech recognition
A new MRNN-based method for continuous Mandarin speech recognition is proposed. The system uses five RNNs to accomplish many subtasks separately and then combine them to integrally solve the problem. They include two RNNs for the discriminations of the two sub-syllable groups of 100 RFD initials and 39 CI finals, two RNNs for the generations of dynamic weighting functions for sub-syllable’s int...
متن کامل